CORE-680: Add OBI distro#4468
Conversation
b8375fc to
13ac9be
Compare
6445339 to
7321f97
Compare
49cbb8e to
fe069ae
Compare
| - '' | ||
| resources: | ||
| - pods | ||
| - services |
There was a problem hiding this comment.
OBI uses service discovery out of the box, not sure if it's specifically for trace decoration or just cluster wide discovery. I'll check if it's configurable, otherwise the odiglet logs a bunch of permission errors from OBI
03d748c to
b8ef2d1
Compare
| func obiLogger() *commonlogger.OdigosLogger { | ||
| return commonlogger.LoggerCompat().With("subsystem", "ebpfobi") | ||
| } |
There was a problem hiding this comment.
think this will create a new logger instance on every call?
maybe worth to consider putting the logger in the factory
In addition some of the logging might be too verbose and in general it is better to let the caller (the instrumentation manager) handle any logging for errors or meaningful events if possible
| type obiInstrumentation struct { | ||
| selector *discover.DynamicPIDSelector | ||
| pid int | ||
| factory *OBIInstrumentationFactory |
There was a problem hiding this comment.
I think it would be cleaner to either:
- have the factory pass a cleanup function
- have the factory pass a context and cancel function
Instead of each instrumentation object having a pointer to the factory.
There was a problem hiding this comment.
I tried that but we need to be able to write back to the factory somehow when an Instrumentation object cancels the OBI instrumenter so that the factory knows to restart the OBI instrumenter on the next instrumentation
There was a problem hiding this comment.
@damemi why is it not possible with a cleanup function?
There was a problem hiding this comment.
@blumamir the cleanup function would still need to have a reference back to the factory anyway (to signal that the next instrumentation should re-start the instrumenter), so it's essentially just a closure with the logic that's already here
| go.opentelemetry.io/otel v1.43.0 | ||
| go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.43.0 | ||
| go.opentelemetry.io/otel/metric v1.43.0 | ||
| go.opentelemetry.io/otel/sdk v1.43.0 | ||
| golang.org/x/sync v0.20.0 | ||
| golang.org/x/sys v0.42.0 | ||
| google.golang.org/grpc v1.79.3 | ||
| google.golang.org/grpc v1.80.0 | ||
| google.golang.org/protobuf v1.36.11 | ||
| k8s.io/api v0.35.2 | ||
| k8s.io/apimachinery v0.35.2 | ||
| k8s.io/client-go v0.35.2 | ||
| k8s.io/api v0.35.3 | ||
| k8s.io/apimachinery v0.35.3 | ||
| k8s.io/client-go v0.35.3 |
There was a problem hiding this comment.
I thought the point of making obi a separate module was so it won't make us align to its dependencies - if that's the case I'd expect to not have all these changes in this PR - or if they are un avoidable. - to remove the separate module for obi
There was a problem hiding this comment.
the point of the separate module is so we can import into enterprise without getting the ambiguous go-auto import
4216c2e to
73372d4
Compare
RonFed
left a comment
There was a problem hiding this comment.
lgtm, still think we need to checkout consuming the archive OBI publishes thay you shared with me before merging with the current Dockerfile/Makefile changes
blumamir
left a comment
There was a problem hiding this comment.
added few comments/suggestions, but overall looks good.
would love to chat about it before merging
| // IsProgrammingLanguageWildcard reports whether lang is the distro wildcard meaning "any language". | ||
| func IsProgrammingLanguageWildcard(lang ProgrammingLanguage) bool { | ||
| return strings.TrimSpace(string(lang)) == string(ProgrammingLanguageWildcard) | ||
| } |
There was a problem hiding this comment.
for other languages we compare directly with == instead of adding a a util function. why is it different for the * language?
There was a problem hiding this comment.
we also have java-ebpf-instrumentations in enterprise. should we use the same here (instrumentations with an s at the end) to keep things consistent?
also, since the mechanism is a bit different, I wonder if this can cause confusion for users
There was a problem hiding this comment.
I don't think so, I named this literally after OBI "opentelemetry-ebpf-instrumentation" the project itself for clarity https://github.com/open-telemetry/opentelemetry-ebpf-instrumentation
| if common.IsProgrammingLanguageWildcard(distro.Language) { | ||
| continue | ||
| } |
There was a problem hiding this comment.
could be nice to add support for this in the rule someday in the duture
| github.com/aws/aws-sdk-go-v2 v1.41.3 // indirect | ||
| github.com/aws/aws-sdk-go-v2/config v1.32.11 // indirect | ||
| github.com/aws/aws-sdk-go-v2/credentials v1.19.11 // indirect | ||
| github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.19 // indirect | ||
| github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.19 // indirect | ||
| github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.19 // indirect | ||
| github.com/aws/aws-sdk-go-v2/internal/ini v1.8.5 // indirect | ||
| github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.6 // indirect | ||
| github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.19 // indirect | ||
| github.com/aws/aws-sdk-go-v2/service/signin v1.0.7 // indirect | ||
| github.com/aws/aws-sdk-go-v2/service/sso v1.30.12 // indirect | ||
| github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.16 // indirect | ||
| github.com/aws/aws-sdk-go-v2/service/sts v1.41.8 // indirect | ||
| github.com/aws/smithy-go v1.24.2 // indirect |
There was a problem hiding this comment.
could be nice not to pull these. I guess obi is a bundle of everything and cannot be composed of just what needed?
|
|
||
| debug-build-odiglet: generate | ||
| go build -o odiglet -gcflags "all=-N -l" cmd/main.go | ||
| CGO_ENABLED=0 go build -a -o odiglet -gcflags "all=-N -l" cmd/main.go |
There was a problem hiding this comment.
| CGO_ENABLED=0 go build -a -o odiglet -gcflags "all=-N -l" cmd/main.go | |
| $(GOBUILD) -o odiglet -gcflags "all=-N -l" cmd/main.go |
| type obiInstrumentation struct { | ||
| selector *discover.DynamicPIDSelector | ||
| pid int | ||
| factory *OBIInstrumentationFactory |
There was a problem hiding this comment.
@damemi why is it not possible with a cleanup function?
7205983 to
46f4458
Compare
46f4458 to
92f0ca1
Compare
What this PR does / why we need it:
This adds
opentelemetry-ebpf-instrumentation(OBI) as a supported workload instrumentation distro.The benefit of this is to provide additional coverage and offer users a way to deploy OBI within our platform if they'd like to.
It relies on features we added upstream to OBI in open-telemetry/opentelemetry-ebpf-instrumentation#1321 and open-telemetry/opentelemetry-ebpf-instrumentation#1388 to support dynamic PID selection (as opposed to OBI's static service/exe config approach).
The OBI
Instrumenteris a single long-lived routine that handles process events, filters/matches them based on config criteria (ie, which PIDs we want), and attaches shared eBPF programs to those processes. TheDynamicSelectorwe added upstream provides a hook into theInstrumenterto update the filter/matching criteria during runtime without needing to restart OBI.This adds OBI as an sdk/distro that can be used with any language. The basic flow of the new distro is:
NewOBIInstrumentationFactory(): Initialize the OBI config and DynamicSelector, but does not start the OBIInstrumenterroutine yet. This is to prevent theInstrumenterfrom running and using resources if nothing is actually instrumented by OBI.factory.CreateInstrumentation(ctx, pid): If the OBIInstrumenteris not started yet, this starts it and returns anobiInstrumentationhandle that stores that DynamicSelector and the instrumented PID. This does not attach the PID to OBI yet.obiInstrumentation.Load(): Uses the stored PID in theoinstrumentation object to callDynamicSelector.AddPIDs(pid)which updates the running OBI manager to include/attach this process.obiInstrumentation.Close(): Removes the stored PID inowithDynamicSelect.RemovePIDs(pid)to update the OBI manager to exclude/detach this process.A caveat to this design is that it's difficult to stop the Instrumenter once it's started (ie, all OBI processes get uninstrumented and we no longer need that routine) without complex async management. We could do something like this on
Close():But that would require a mutex, and it's possible that another OBI app could come along in the meantime waiting to be instrumented while that mutex is held, and the Instrumenter gets cancelled before the new app gets added. Which would be very confusing to debug. I tried a lot of different approaches and they were all messy. I think some more upstream OBI changes around an actual handle on the instrumenter would help, along with other use cases for holding a reference to the instrumenter itself.
Overall:
instrumentationused by OdigletAnother possible upstream contribution would be a signal from OBI for when it's actually started/running/ready to accept new PIDs which Load() could wait on
This whole approach is slightly different from languages like Go, where the Odiglet is the long lived process, so each
CreateInstrumentation()in Go calls the go-auto framework to load, attach, and manage. In the case of OBI, the OBI Instrumenter handles loading, attaching, and managing so Odiglet is a wrapped layer on top of that for integration with our control plane.This also adds OBI ebpf generation to the odiglet Dockerfile using the upstream [obi-generator[(https://github.com/open-telemetry/opentelemetry-ebpf-instrumentation/pkgs/container/obi-generator)
The OBI distro is added as its own module so it can be imported into enterprise easier: https://github.com/odigos-io/odigos-enterprise/pull/2577 (see that PR for details on OBI distro module)
UI Changes here: https://github.com/odigos-io/ui-kit/pull/748
New C++ app in simple-demo for an e2e test here: odigos-io/simple-demo#67
Changelog entry: Does this PR introduce a user-facing bug fix, feature, dependency update, or breaking change??